228 research outputs found
Dynamic Vision Sensor integration on FPGA-based CNN accelerators for high-speed visual classification
Deep-learning is a cutting edge theory that is being applied to many fields.
For vision applications the Convolutional Neural Networks (CNN) are demanding
significant accuracy for classification tasks. Numerous hardware accelerators
have populated during the last years to improve CPU or GPU based solutions.
This technology is commonly prototyped and tested over FPGAs before being
considered for ASIC fabrication for mass production. The use of commercial
typical cameras (30fps) limits the capabilities of these systems for high speed
applications. The use of dynamic vision sensors (DVS) that emulate the behavior
of a biological retina is taking an incremental importance to improve this
applications due to its nature, where the information is represented by a
continuous stream of spikes and the frames to be processed by the CNN are
constructed collecting a fixed number of these spikes (called events). The
faster an object is, the more events are produced by DVS, so the higher is the
equivalent frame rate. Therefore, these DVS utilization allows to compute a
frame at the maximum speed a CNN accelerator can offer. In this paper we
present a VHDL/HLS description of a pipelined design for FPGA able to collect
events from an Address-Event-Representation (AER) DVS retina to obtain a
normalized histogram to be used by a particular CNN accelerator, called
NullHop. VHDL is used to describe the circuit, and HLS for computation blocks,
which are used to perform the normalization of a frame needed for the CNN.
Results outperform previous implementations of frames collection and
normalization using ARM processors running at 800MHz on a Zynq7100 in both
latency and power consumption. A measured 67% speedup factor is presented for a
Roshambo CNN real-time experiment running at 160fps peak rate.Comment: 7 page
Efficient DMA transfers management on embedded Linux PSoC for Deep-Learning gestures recognition: Using Dynamic Vision Sensor and NullHop one-layer CNN accelerator to play RoShamBo
This demonstration shows a Dynamic Vision Sensor able
to capture visual motion at a speed equivalent to a highspeed
camera (20k fps). The collected visual information is presented as
normalized histogram to a CNN accelerator hardware, called
NullHop, that is able to process a pre-trained CNN to
play Roshambo against a human. The CNN designed for this
purpose consist of 5 convolutional layers and a fully connected
layer. The
latency for processing one histogram is 8ms. NullHop is deployed
on the FPGA fabric of a PSoC from Xilinx, the Zynq 7100, which
is based on a dual-core ARM computer and a Kintex-7 with 444K
logic cells, integrated in the same chip. ARM computer is running
Linux and a specific C++ controller is running the whole
demo. This controller runs at user space in order to extract the
maximum throughput thanks to an efficient use of the AXIStream,
based of
DMA transfers. This short delay needed to process one
visual histogram, allows us to average several consecutive
classification
outputs. Therefore, it provides the best estimation of the symbol
that the user presents to the visual sensor. This output is then
mapped to present the winner symbol within the 60ms latency
that the brain considers acceptable before thinking that there is a
trick.Ministerio de Economía y Competitividad TEC2016-77785-
Unified description of structure and reactions: implementing the Nuclear Field Theory program
The modern theory of the atomic nucleus results from the merging of the
liquid drop (Niels Bohr and Fritz Kalckar) and of the shell model (Marie
Goeppert Meyer and Axel Jensen), which contributed the concepts of collective
excitations and of independent-particle motion respectively. The unification of
these apparently contradictory views in terms of the particle-vibration
(rotation) coupling (Aage Bohr and Ben Mottelson) has allowed for an ever
increasingly complete, accurate and detailed description of the nuclear
structure, Nuclear Field Theory (NFT, developed by the Copenhagen-Buenos Aires
collaboration) providing a powerful quantal embodiment. In keeping with the
fact that reactions are not only at the basis of quantum mechanics (statistical
interpretation, Max Born) , but also the specific tools to probe the atomic
nucleus, NFT is being extended to deal with processes which involve the
continuum in an intrinsic fashion, so as to be able to treat them on an equal
footing with those associated with discrete states (nuclear structure). As a
result, spectroscopic studies of transfer to continuum states could eventually
use at profit the NFT rules, extended to take care of recoil effects. In the
present contribution we review the implementation of the NFT program of
structure and reactions, setting special emphasis on open problems and
outstanding predictions.Comment: submitted to Physica Scripta to the Focus Issue on Nuclear Structure:
Celebrating the 1975 Nobel Priz
Momentum distributions in halo nuclei
From the analogy between the break-up of weakly bound, neutron-rich nuclei and the phenomenon of optical diffraction, it is possible to formulate a model for the momentum distribution of both the core and the valence neutrons of halo nuclei which displays a simple dependence on nuclear structure parameters. The model is applied to the analysis of reactions where11Be,11Li and14Be impinge on12C, providing an overall account of the experimental findings and predictions for further measurements
Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
Deep learning has significantly advanced the state of the
art in artificial intelligence, gaining wide popularity from both industry
and academia. Special interest is around Convolutional Neural Networks
(CNN), which take inspiration from the hierarchical structure
of the visual cortex, to form deep layers of convolutional operations,
along with fully connected classifiers. Hardware implementations of these
deep CNN architectures are challenged with memory bottlenecks that
require many convolution and fully-connected layers demanding large
amount of communication for parallel computation. Multi-core CPU
based solutions have demonstrated their inadequacy for this problem
due to the memory wall and low parallelism. Many-core GPU architectures
show superior performance but they consume high power and also
have memory constraints due to inconsistencies between cache and main
memory. OpenCL is commonly used to describe these architectures for
their execution on GPGPUs or FPGAs. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy
using embedded parallel BlockRAMs. This boosts the parallel use
of shared memory elements between multiple processing units, avoiding
data replicability and inconsistencies. This makes FPGAs potentially
powerful solutions for real-time classification of CNNs. In this
paper both Altera and Xilinx adopted OpenCL co-design frameworks
for pseudo-automatic development solutions are evaluated. A comprehensive
evaluation and comparison for a 5-layer deep CNN is presented.
Hardware resources, temporal performance and the OpenCL architecture
for CNNs are discussed. Xilinx demonstrates faster synthesis, better
FPGA resource utilization and more compact boards. Altera provides
multi-platforms tools, mature design community and better execution
times.Ministerio de Economía y Competitividad TEC2016-77785-
Beneficial effects of dietary supplementation with green tea catechins and cocoa flavanols on aging-related regressive changes in the mouse neuromuscular system
This work was supported by Abbott and a grant from the Spanish Ministerio de Ciencia, Innovacion y Universidades cofinanced by Fondo Europeo de Desarrollo Regional (RTI2018-099278-B-I00 to JC and JE) .Besides skeletal muscle wasting, sarcopenia entails morphological and molecular changes in distinct components of the neuromuscular system, including spinal cord motoneurons (MNs) and neuromuscular junctions (NMJs); moreover, noticeable microgliosis has also been observed around aged MNs. Here we examined the impact of two flavonoid-enriched diets containing either green tea extract (GTE) catechins or cocoa flavanols on age-associated regressive changes in the neuromuscular system of C57BL/6J mice. Compared to control mice, GTE- and cocoa-supplementation significantly improved the survival rate of mice, reduced the proportion of fibers with lipofuscin aggregates and central nuclei, and increased the density of satellite cells in skeletal muscles. Additionally, both supplements significantly augmented the number of innervated NMJs and their degree of maturity compared to controls. GTE, but not cocoa, prominently increased the density of VAChT and VGluT2 afferent synapses on MNs, which were lost in control aged spinal cords; conversely, cocoa, but not GTE, significantly augmented the proportion of VGluT1 afferent synapses on aged MNs. Moreover, GTE, but not cocoa, reduced aging-associated microgliosis and increased the proportion of neuroprotective microglial phenotypes. Our data indicate that certain plant flavonoids may be beneficial in the nutritional management of age-related deterioration of the neuromuscular system.Abbott LaboratoriesSpanish Ministerio de Ciencia, Innovacion y UniversidadesEuropean Commission RTI2018-099278-B-I0
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times
- …